Majority of Brazilians are of mixed race according to IBGE. Racial and genetic admixture impact demographic information and it is in public interest that genetic information of demographic groups be used to improve the health system (SUS) of Brazil. Integration of genomic and ancestry information of Brazilians into public health data, such as the data provided by Information Technology Department of SUS - DATASUS, can help estimate genetic risk of diseases and propose policies to improve diagnosis, allocation of resources, services, and therapies for population groups at higher genetic risk of diseases. Here we introduce a computational framework in the R language - GERALDA - that used mitochondrial variants to estimate the genetic risk of neuroblastoma and neurodegenerative diseases in Brazilians. This work shows an increased genetic risk of disease and impact on cognition, morbidity, and mortality of Brazilians using mitochondrial DNA variants. This information can be used for the organization of the public health system, contributing to the rational use of resources by the health system.
Our group identified hypoxia as a triggering signal for the cellular transition from adrenergic (ADRN) to mesenchymal (MES) cells in neuroblastoma. We found that the transition is mediated by the deposition of the epigenetic marker 5-hydroxymethyl-cytosine (5-hmC) via the ten eleven translocation enzyme (TET1), a functional mechanism of hypomethylation. We are investigating hypomethylation patterns via 5-hmC in immortalized cells, tumor samples and cell-free DNA (cfDNA) isolated from peripheral blood liquid biopsies of neuroblastoma patients. Repetitive genomic elements, or repetitive regions of the genome, are part of the non-coding region and are important for the maintenance of the pluripotent cellular state of stem cells, which we call the de-differentiated state. Mesenchymal stem cells (MES) are dedifferentiated and their gene expression pattern shows a high correlation with the stem and pluripotent cell state.
We evaluated the deposition of the epigenetic marker 5-hmC in repetitive regions of the genomes of cells, tumors, and cf-DNA of neuroblastoma patients for oncological management of patients. We aim to incorporate liquid biopsy with sequencing of 5-hmC in repetitive element markers into the routine of public precision medicine programs for neuroblastoma in Brazil. To this end, we propose the construction of a computational database that incorporates genomic and demographic data available in the health system, to allow better data analysis, machine learning, and artificial intelligence using ohmic marker data for management of nervous system diseases in the public health system, together with the Center for Data Integration and Knowledge for Health of the Gonçalo Moniz Institute at Fiocruz in Bahia.
Neuroblastoma is a pediatric cancer of the peripheral nervous system. Tumors are composed of two main cell lineages: adrenergic (ADRN) and mesenchymal (MES) cells. These cells can interconvert using enhancers and superenhancers Groningen et al. (2017) (van Groningen et al. 2017; Boeva et al. 2017), epigenetic markers in noncoding genome regions identified using machine learning by hierarchical clustering (Van Groningen 2017) and principal component analysis (Boeva et al., 2017). Despite the importance of the mechanism of ADRN-to-MES cell interconversion, the cellular signals that trigger the transition are not understood. ADRN cells are neuron-like and sensitive to chemotherapy (Figure 1, left), while MES cells are similar to undifferentiated or dedifferentiated stem cells (Figure 1, right) and responsible for resistance to chemotherapy and immunotherapy (Van Groningen et al., 2017; Kendsersky et al., 2022; Mabe et al., 2022).
Hypoxia is the condition of limited oxygen supply for tumor growth. Among extracellular signals that activate 5-hmC deposition via TET1, our group at the University of Chicago identified hypoxia as a signal that activates gene expression by 5-hmC deposition and methylation removal (Mariani et al., 2014). Our studies and those of other groups have shown that hypoxia drives dedifferentiation in neuroblastoma cells (Jögi et al. 2002; Mariani et al. 2014; Hains et al. 2022). We identified a functional epigenetic demethylation mechanism mediated by Ten Eleven Translocation (TET) family enzymes, and genes activated by hypoxia in the transition from the ADRN to MES cellular state (Chaves et al. 2024 - in preparation), (Figure 5). Our group’s results propose a functional mechanism of hypoxia-driven methylation removal and gene expression activation (Mariani et al., 2014; Hains et al., 2022; Chaves et al. 2024 - in preparation) (Figures 2 and 5). Our work suggests hypoxia as a mechanism for maintaining cellular dedifferentiation, consistent with the hypomethylation of repetitive transposable elements (Figure 5).
Most CpG islands are located in repetitive intergenic regions of the genome (Lanciano and Cristofari 2020) and, when methylated, serve to maintain an inactive state of DNA transcription (Fetahu and Taschner-Mandl 2021). Repetitive transposable DNA elements (TEs) are mobile genetic elements that make up a large fraction of the genome, reaching 15% in C. elegans and 85% in maize (Lanciano and Cristofari, 2020). These percentages show the potential of these elements as biomarkers (Lanciano and Cristofari, 2020). TEs have been considered “junk DNA” (Burns 2017; Ma et al. 2022) and their value in epigenetic modulation in neuroblastoma and as biomarkers is undetermined.
TEs are active in human embryonic stem cells and, through functional hypomethylation, participate in the activation of gene expression in a mechanism compatible with TET1 activity and 5-hmC deposition (Ma et al., 2022). Retrotransposons such as HERV-H demarcate topologically associated domains (TADs) in chromatin, which maintain the cellular pluripotency state (Zhang et al., 2019). These observations agree with the hypothesis of induction of a stem cell (dedifferentiated) state in hypoxia-activated ADRN cells, involving repetitive or transposable elements, which lead to the expression of MES genes (Figure 5). Important biomarkers can be identified among TE elements, since they represent a high fraction of the genome composition. Thus, the detection of repetitive elements of the genome in cfDNA isolated by liquid biopsy can help in the management of cancer patients (Figure 2).
Studies from our group have identified 5-hmC deposition patterns (Figure 3A) in tumors (Applebaum et al., 2019) and liquid biopsy cfDNA (Applebaum et al., 2020) using the nano 5-hmC-seal method developed by Dr. Chuan He from the University of Chicago (Figure 3B). We propose that the TET1 enzyme, activated by hypoxia, causes oxidation of methylated DNA sequences and formation of 5-hydroxymethyl-cytosine (5-hmC) hypomethylation regions (Figure 3A). Thus, in the present project, sequencing the 5-hmC marker in Brazil will aid in patient management, as done in the USA (Chennakesavalu, Moore, Chaves, et al. 2024). We postulate that it will be possible to quantify ADRN and MES phenotypes in tumors in Brazilians, as we have recently done (Vayani, Chaves et al. 2023).
Mitochondria are important organelles in cellular metabolism and the citric acid cycle and play a role in pluripotency and differentiation at the embryonic stage (Carey et al. 2015; Hensley, Wasti, and DeBerardinis 2013; Qing et al. 2012; Yoo et al. 2020). Mitochondrial DNA is small and maternally derived, with approximately 16,000 base pairs. Mitochondria provide the cellular supply of ATP for the nervous system, playing an important role in diseases of this system, such as neuroblastoma and neurodegenerations. In neuroblastoma, John Maris’ group conducted case-control studies in Caucasians from the USA and observed that mitochondrial haplogroups (genetic variants) are associated with a reduced risk of the disease (Chang et al. 2020). The same group observed that mitochondrial single nucleotide polymorphism (SNP), rs2853493, is associated with the risk of neuroblastoma, including impacting the expression of the mitochondrial cytochrome B gene, MT-CYB (Chang et al. 2022).
In neurodegenerative diseases, Tranah and collaborators (2012) observed that in elderly Caucasians, different mitochondrial haplogroups present an increased risk of developing dementia and cognitive decline (Tranah et al. 2012). The group reported that in African-Americans, haplogroup L1 represents a greater risk of developing dementia when compared to haplogroup L3, which is more common in this ethnic-racial group. Also for haplogroup L3, the SNP p.V193I, a substitution in the ND2 gene, was associated with increased levels of amyloid plaque, a phenotype of Alzheimer’s disease (Tranah et al. 2014). Neuroblastoma arises during sympathetic neurogenesis. Neurogenesis and nervous system development pathways are suppressed in mitochondrial haplogroups in neuroblastoma and this can be associated with the underlying mechanism of reduced risk associated with mitochondrial haplogroups investigated in Chang et al. (2020). Due to its importance for dementia and neuroblastoma, computational methods need to be developed to quantify genetic risk associated with mitochondrial sequences, especially in non-US White populations, for which there is scarse data available. The choice of mitochondrial sequences considers an existing cross-talk between mitochondrial metabolism and activation of repetitive genomic elements (Baeken, Moosmann, and Hajieva 2020; Larsen et al. 2017; Stoccoro and Coppedè 2021; Lopes 2020; Bravo et al. 2020) that has implications in the role of mitochondrial haplogroups in activation of the immune system, as discussed by the work of Chang et al. (2020).
To allow contrasting demographic data on nervous system diseases in the context of the public health system of Brazil, we propose exploring mortality data from neuroblastoma and neurodegenerations in the public database of the Informatics Department of the Ministry of Health of Brazil (DATASUS). This will allow the development of computational tools capable of estimating genetic risk for different Brazilian racial groups, a long necessary approach to the health system of Brazil, based on the identification of mitochondrial variants that confer a higher risk of neuroblastoma and neurodegenerations, using data curated from the literature (Tranah et al. 2012; Tranah et al. 2014; Chang et al. (2020); Chang et al. 2022).
We aim to improve management and decision-making processes in health systems by developing tools capable of processing and analyzing data generated by such systems, investigating the dynamics that affect mortality in various morbidities, such as those that affect the nervous system as is the case of neuroblastoma. In Brazil, the Federal Constitution of 1988 established the Unified Health System, and after that, the Department of Informatics of the SUS (DATASUS) was created to organize the data collected by the SUS (Saldanha, Rocha Bastos, and Barcellos, 2019). More recently, the Programa Genomas Brasil was implemented in the country, aiming to sequence 100,000 nationals to inform precision medicine policies in the public health system of SUS. Our group verified the persistence of racial inequality in the risk and survival of neuroblastoma (Chennakesavalu et al. 2023). Due to the genetic admixture present in Brazil since the beginning of colonization, a considerable part of the Brazilian population is of mixed race or black, which raises questions about the incidence, risk and treatment of neuroblastoma patients in Brazil considering their racial identification. Brazil is marked by profound socioeconomic inequalities related to the ethnic-racial origin of the population, which include but are not limited to digital literacy (Araújo da Silva and Behar 2019), use of programming languages (Sano et al. 2024; Vera-Choqqueccota et al. 2024) for genomic science, and racial inequality in health. The latter was recently identified by the Longitudinal Study of Adult Health (ELSA-Brazil) (ELSA Brazil 2023). The creation of databases linked to the SUS for genomic research must therefore take into account the impact of Brazilian genetic admixture on both access to digital literacy and the health of Brazilians themselves. Considering the requirement of public resources for genomic studies and genomic literacy of researchers and scientists in Brazil, we propose the Geralda framework as a concept to guide the integration of genomic information into the demographis database of SUS, as proposed in the Materials and Methods section.
The R package Microdatasus was used to access mortality data on neuroblastoma and neurodegenerative diseases as described by Freitas Saldanha et al. (2019).
Mitochondrial haplogroups were classified using Haplogrep3 as described in Schonherans et al. (2023). This classification algorithm generates a csv file that can be used with other R packages to understand the genetic risk associated with neuroblastoma and neurodegenerations for each haplogroup.
Risk was estimated and ploted using geobr as described by Pereira and Goncalves (2024). Geobr is a computational package to download official spatial data sets of Brazil. The package includes a wide range of geospatial data in geopackage format (like shapefiles), available at various geographic scales and for various years with harmonized attributes, projection and topology. This allows us to achieve a spatial-geographic organization of the data provided by the DATASUS department of the Ministry of Health.
The adoption of genomic information and epigenetic markers, such as the 5-hmC marker in neuroblastoma, into the SUS system involves challenges, which necessarily include activities to support digital literacy and the use of computer programming technologies in genomics throughout the country. Thus, this project includes, in the methodological part, collaborative genomic research between CIDACS in Brazil and my doctoral and postdoctoral institutions in the USA, to mediate the teaching of computer programming languages for genomic research and the construction of the database for the application of machine learning to demographic data from the SUS. CIDACS proposed the harmonization of databases of social and health indicators, creating the Cohort of 100 million Brazilians, making important contributions to national health and epidemiology (Barreto et al. 2021). Integrating genomic information databases into the pipelines that allow investigations of the social and demographic data in Brazil can enrich and improve public policies of the nacional public health system of Brazil. This also has a potential to generate protocols for the use of machine learning and artificial intelligence in disease classification in the health system.
Figure 1: Framework of the GERALDA pipeline. Starting with fasta or fastq sequences, samples are aligned to the reference genome. Once VCF files are produced out of each sample, custom scripts are used to extract the genotypes of interest that will be used as features to inform the machine learning algorithm to classify discrete categories. Each haplotype thus identified is then used to label a racial group in the Microdatasus dataframe. This information is then used to estimate the genetic risk of each racial group in the DATASUS dataframe.
Racial self-classification has social and political consequences in Brazil. Although it is acknowleged that the country has a historical genetic admixuture involving the Portuguese and other European, African and Native American populations, Brazil struggles with racial inequality in health according to Chor and Araujo Lima (2005). Racial classification has been difficult to achieve in Brazil because of the known genetic admixture. Among the parameters for racial classification, ancestry, self-identification or third person identification have been proposed. The IBGE adopted self-declaration for racial classification Chor and Araujo Lima (2005). Another racial classification system is the genetic or genomic ancestry. The genomic ancestry investigated using mitochondrial sequences can also inform the genetic risk for diseases of the nervous system such as neurodegenerations and neuroblastoma. To investigate the ancestry of the self-declared white group in Brazil, we used Haplogrep3 to classify the matrilinear lineage of self-declared white Brazilians aiming to quantify genetic risks for variants known to affect the nervous system (Figure 2).
Figure 2: Identification of mitochondrial haplogroups in sequences of self-declared white Brazilians, identified using the Haplogrep 3 tool (Schönherr, Weissensteiner, and Kronenberg 2023). The sample of mitochondrial sequences from Brazilians presents haplogroups J and K. These genotypes were related to the risk of dementia in the 2012 Tranah study in individuals of European ancestry (blue). In individuals of African ancestry (purple), haplogroup L1 is identified, which presents an increased risk of developing dementia, according to Tranah 2014. Also according to the 2014 Tranah study, the most common haplogroup among people of African ancestry, haplogroup L3, which is also observed in this sample from Brazil with 4 individuals (4 counts), presents higher levels of amyloid plaque deposition. This suggests that these individuals represent a risk group for the development of dementia among Brazilians.
Genotypes in Figure 2 are output of the Haplogrep 3 tool. They can be visualized in a table with the numbers of sequences per mitochondrial continental origin. A table describing the region of origin of each mitochondrial sequence analyzed can be visualized as follows.
# haplogroups.extended <- read.table("../../DNA Brasil/DNA do Brasil/matrilineal sequences/data/haplogroups 6 eur/haplogroups.extended.csv", header = T)
haplogroups.extended <- read.table("data/haplogroups/haplogroups.extended.csv", header = T)
regions <- read.table("../../ReComBio Scientific/geraldo/data/regions.txt", sep = "\t", header = T)
haplogroups.regions <- dplyr::left_join(haplogroups.extended, regions, by = c("SampleID" = "ID"))
haplogroups.regions <- haplogroups.regions %>% select(c(SampleID, Haplogroup, Origin, Region, Found_Polys))
cross_cases(haplogroups.regions, Origin, Region)
| Region | |||
|---|---|---|---|
| Northeast | South | Southeast | |
| Origin | |||
| African | 18 | 30 | |
| Amerindian/Asian | 7 | 15 | |
| European | 14 | 17 | |
| #Total cases | 39 | 17 | 45 |
We can take a look at the haplogroups.regions object, that originated the table above. We want to use this object to investigate the risk for neuroblastoma and neurodegenerative diseases per regions of Brazil based on the genotypes of the mitochondrial DNA sequences for each of the large region genotypes of the mitochondrial sequences.
mt_features_df <- read.table("../../ReComBio Scientific/geraldo/data/mt_dcast_gt_lab.txt", header = T)
haplogroups.features <- dplyr::left_join(haplogroups.regions, mt_features_df, by = c("SampleID" = "fastaID"))
haplogroups.features <- haplogroups.features %>% select(c(SampleID, Haplogroup, Origin, Region, MT_16093_bp, Found_Polys))
library(stringr)
# Split haplogroups
split_haplo_vector <- str_sub(haplogroups.features$Haplogroup, start = 1, end = 2)
# Add vector to DF, after column of interest
library(tibble)
haplogroups.features <- add_column(haplogroups.features, Genotype = split_haplo_vector, .after = "Haplogroup")
haplogroups.features <- haplogroups.features %>% select(-c(Haplogroup))
Let’s write a code to calculate the incidence of the mitochondrial variants in the large regions of Brazil. The code will count each of the haplogroups in Brazil as a whole:
df_selected <- haplogroups.features %>% dplyr::select(SampleID, Genotype, Region, Found_Polys)
kable(df_selected, caption="Found Polymorphisms") %>%
kable_styling("striped", full_width = F, font_size = 12) %>%
scroll_box(width = "100%", height = "600px")
| SampleID | Genotype | Region | Found_Polys |
|---|---|---|---|
| AF243627 | A2 | Northeast | 152C! 16111T 16126C 16223T 16259T 16290T 16319A 16362C |
| AF243628 | G1 | Northeast | 16223T 16325C 16362C |
| AF243629 | B4 | Northeast | 16189C 16217C |
| AF243630 | B2 | Northeast | 16189C 16217C 16249C 16312G 16344T |
| AF243631 | A+ | Northeast | 16223T 16290T 16319A 16362C |
| AF243632 | C1 | Northeast | 16223T 16298C 16325C 16327T 16362C |
| AF243633 | M7 | Northeast | 16223T 16295T 16362C |
| AF243634 | L1 | Northeast | 1438G! 15301A! 16126C 16129A! 16187T 16189C 16223T 16264T 16270T 16278T 16293G 16311C |
| AF243635 | L3 | Northeast | 16176T 16223T 16327T |
| AF243636 | M4 | Northeast | 6131G! 16223T 16294T 16294T |
| AF243637 | L3 | Northeast | 16223T 16327T |
| AF243638 | L0 | Northeast | 73G! 146C! 182T! 195C! 263G! 15301A! 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243639 | L2 | Northeast | 150T! 182T! 16189C 16192T 16223T 16278T 16294T 16309G 16311C! |
| AF243640 | L3 | Northeast | 16124C 16223T |
| AF243641 | L3 | Northeast | 16185T 16223T 16327T |
| AF243642 | L2 | Northeast | 16223T 16264T 16278T 16311C! |
| AF243643 | L2 | Northeast | 150T! 182T! 16189C 16223T 16225T 16234T 16278T 16294T 16309G 16311C! |
| AF243644 | L1 | Northeast | 15301A! 16129A 16187T 16189C 16214T 16223T 16265C 16278T 16291T 16294T 16311C 16360T |
| AF243645 | L1 | Northeast | 15301A! 16129A 16187T 16189C 16223T 16265C 16278T 16286G 16294T 16311C 16360T |
| AF243646 | L2 | Northeast | 150T! 182T! 16223T 16278T 16294T 16309G 16311C! |
| AF243647 | L0 | Northeast | 73G! 146C! 182T! 195C! 263G! 15301A! 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243648 | L3 | Northeast | 16124C 16223T |
| AF243649 | L3 | Northeast | 16172C 16223T 16327T |
| AF243650 | L2 | Northeast | 150T! 182T! 16223T 16278T 16294T 16309G 16311C! |
| AF243651 | L3 | Northeast | 16129A 16209C 16223T 16292T 16295T 16311C |
| AF243652 | H1 | Northeast | 16309G |
| AF243653 | H1 | Northeast | 16362C |
| AF243654 | J | Northeast | 16069T 16126C |
| AF243655 | HV | Northeast | 16234T 16311C 16362C |
| AF243656 | H1 | Northeast | 16075C 16189C 16356C |
| AF243657 | K1 | Northeast | 16093C 16224C 16311C 16319A |
| AF243658 | H7 | Northeast | 16221T |
| AF243659 | K | Northeast | 16224C 16311C |
| AF243660 | H2 | Northeast | |
| AF243661 | H1 | Northeast | 16189C 16356C |
| AF243662 | T2 | Northeast | 16126C 16294T 16296T 16304C |
| AF243663 | H2 | Northeast | 16189C |
| AF243664 | V7 | Northeast | 16153A 16298C |
| AF243665 | H3 | Northeast | 16293G |
| AF243666 | L3 | Southeast | 750G! 16223T 16265T |
| AF243667 | L2 | Southeast | 16111A 16145A 16184T 16223T 16239T 16278T 16292T 16311C 16355T |
| AF243668 | L3 | Southeast | 750G! 16223T 16265T |
| AF243669 | L1 | Southeast | 15301A! 16086C 16129A 16187T 16189C 16223T 16241G 16274A 16278T 16291T 16293G 16294T 16311C 16360T |
| AF243670 | L3 | Southeast | 16185T 16209C 16223T 16327T |
| AF243671 | L2 | Southeast | 150T! 195C! 16223T 16224C 16278T 16311C! |
| AF243672 | L2 | Southeast | 16223T 16264T 16278T 16311C 16311C |
| AF243673 | L0 | Southeast | 73G! 146C! 182T! 185A! 195C! 263G! 15301A! 16129A! 16148T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243674 | M5 | Southeast | 16223T 16278T 16294T |
| AF243675 | L1 | Southeast | 15301A! 16071T 16129A 16145A 16187T 16189C 16213A 16223T 16234T 16265C 16278T 16286G 16294T 16311C 16360T |
| AF243676 | U6 | Southeast | 16172C 16189C 16219G 16278T |
| AF243677 | U6 | Southeast | 16172C 16189C 16219G 16278T 16362C |
| AF243678 | L4 | Southeast | 5460A! 16223T 16293T 16311C 16355T 16362C |
| AF243679 | L2 | Southeast | 16114A 16129A 16213A 16223T 16278T 16311C! |
| AF243680 | L1 | Southeast | 1438G! 15301A! 16126C 16129A! 16187T 16189C 16223T 16264T 16270T 16278T 16311C |
| AF243681 | L4 | Southeast | 5460A! 16223T 16293T 16311C 16355T 16362C |
| AF243682 | X1 | Southeast | 16104T 16189C 16223T 16278T! |
| AF243683 | L1 | Southeast | 195C! 2283T! 7055G! 15301A! 16104T 16129A! 16163G 16187T 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243684 | L3 | Southeast | 10398G! 16185T 16223T 16327T! |
| AF243685 | L0 | Southeast | 73G! 146C! 152C! 182T! 195C! 263G! 15301A! 16093C 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T 16278T 16293G 16311C 16320T |
| AF243686 | L3 | Southeast | 16185T 16223T 16327T |
| AF243687 | L1 | Southeast | 15301A! 16129A 16187T 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243688 | L1 | Southeast | 195C! 7055G! 15301A! 16129A 16163G 16187T 16189C 16209C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243689 | L1 | Southeast | 15301A! 16086C! 16129A! 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243690 | L3 | Southeast | 16223T 16320T |
| AF243691 | L3 | Southeast | 16172C 16189C 16223T 16320T |
| AF243692 | M5 | Southeast | 16223T 16278T 16294T |
| AF243693 | L3 | Southeast | 16172C 16189C 16223T 16311C 16320T |
| AF243694 | L1 | Southeast | 198T! 10398G! 15301A! 16129A 16187T 16189C 16223T! 16278T 16293G 16294T 16311C 16360T |
| AF243695 | L3 | Southeast | 16172C 16189C 16223T 16320T |
| AF243696 | A+ | Southeast | 16189C 16223T 16290T 16319A 16362C |
| AF243697 | A2 | Southeast | 152C! 16097C 16098G 16111T 16223T 16290T 16319A 16362C |
| AF243698 | C1 | Southeast | 16223T 16325C 16327T |
| AF243699 | A2 | Southeast | 152C! 16111T! 16126C 16223T 16278T 16290T 16319A 16362C |
| AF243700 | A2 | Southeast | 152C! 16111T 16192T 16223T 16290T 16319A 16362C |
| AF243701 | A8 | Southeast | 16223T 16242T 16290T 16319A |
| AF243702 | B4 | Southeast | 16189C 16217C |
| AF243703 | B4 | Southeast | 16189C 16217C |
| AF243704 | G1 | Southeast | 16223T 16325C 16362C |
| AF243705 | C1 | Southeast | 16223T 16298C 16325C 16327T |
| AF243706 | A2 | Southeast | 152C! 16111T 16223T 16290T 16319A 16362C |
| AF243707 | B4 | Southeast | 16189C 16217C |
| AF243708 | C1 | Southeast | 16223T 16298C 16325C 16327T |
| AF243709 | B4 | Southeast | 16189C 16217C |
| AF243710 | A2 | Southeast | 152C! 16111T 16189C 16223T 16290T 16319A 16362C |
| AF243780 | H5 | South | 16304C |
| AF243781 | H1 | South | 16162G |
| AF243782 | U5 | South | 16144C 16189C 16192T! 16270T |
| AF243783 | T2 | South | 16126C 16153A 16294T 16296T |
| AF243784 | H2 | South | |
| AF243785 | J1 | South | 16069T 16126C 16261T |
| AF243786 | H2 | South | 16124C 16354T |
| AF243787 | H7 | South | 16213A |
| AF243788 | T2 | South | 16126C 16147T 16294T 16296T 16297C 16304C |
| AF243789 | H3 | South | 16093C |
| AF243790 | X2 | South | 16189C 16223T 16248T 16278T |
| AF243791 | HV | South | 16298C |
| AF243792 | U5 | South | 16189C 16192T! 16270T |
| AF243793 | K1 | South | 16224C 16311C 16319A |
| AF243794 | U7 | South | 16309G 16318T |
| AF243795 | J1 | South | 2706G! 16069T 16126C 16222T |
| AF243796 | R0 | South | 16126C 16362C |
haplogrupos_agregados <- df_selected %>% count(Genotype)
kable(haplogrupos_agregados, caption="Counts") %>%
kable_styling("striped", full_width = F, font_size = 12) %>%
scroll_box(width = "100%", height = "600px")
| Genotype | n |
|---|---|
| A+ | 2 |
| A2 | 6 |
| A8 | 1 |
| B2 | 1 |
| B4 | 5 |
| C1 | 4 |
| G1 | 2 |
| H1 | 5 |
| H2 | 4 |
| H3 | 2 |
| H5 | 1 |
| H7 | 2 |
| HV | 2 |
| J | 1 |
| J1 | 2 |
| K | 1 |
| K1 | 2 |
| L0 | 4 |
| L1 | 11 |
| L2 | 9 |
| L3 | 16 |
| L4 | 2 |
| M4 | 1 |
| M5 | 2 |
| M7 | 1 |
| R0 | 1 |
| T2 | 3 |
| U5 | 2 |
| U6 | 2 |
| U7 | 1 |
| V7 | 1 |
| X1 | 1 |
| X2 | 1 |
This part now will allow calculation of the rate of incidence of each of the haplogroups or the frequency of the genetic variants in the large regions of Brazil:
haplogrupos_agregados <- df_selected %>%
group_by(Region) %>%
count(Genotype) %>%
mutate("N"=n())%>%
mutate(freq=n/N)
kable(haplogrupos_agregados, caption="Mitochondrial Haplogroups per Region") %>%
kable_styling("striped", full_width = F, font_size = 12) %>%
scroll_box(width = "100%", height = "600px")
| Region | Genotype | n | N | freq |
|---|---|---|---|---|
| Northeast | A+ | 1 | 22 | 0.0454545 |
| Northeast | A2 | 1 | 22 | 0.0454545 |
| Northeast | B2 | 1 | 22 | 0.0454545 |
| Northeast | B4 | 1 | 22 | 0.0454545 |
| Northeast | C1 | 1 | 22 | 0.0454545 |
| Northeast | G1 | 1 | 22 | 0.0454545 |
| Northeast | H1 | 4 | 22 | 0.1818182 |
| Northeast | H2 | 2 | 22 | 0.0909091 |
| Northeast | H3 | 1 | 22 | 0.0454545 |
| Northeast | H7 | 1 | 22 | 0.0454545 |
| Northeast | HV | 1 | 22 | 0.0454545 |
| Northeast | J | 1 | 22 | 0.0454545 |
| Northeast | K | 1 | 22 | 0.0454545 |
| Northeast | K1 | 1 | 22 | 0.0454545 |
| Northeast | L0 | 2 | 22 | 0.0909091 |
| Northeast | L1 | 3 | 22 | 0.1363636 |
| Northeast | L2 | 5 | 22 | 0.2272727 |
| Northeast | L3 | 7 | 22 | 0.3181818 |
| Northeast | M4 | 1 | 22 | 0.0454545 |
| Northeast | M7 | 1 | 22 | 0.0454545 |
| Northeast | T2 | 1 | 22 | 0.0454545 |
| Northeast | V7 | 1 | 22 | 0.0454545 |
| South | H1 | 1 | 13 | 0.0769231 |
| South | H2 | 2 | 13 | 0.1538462 |
| South | H3 | 1 | 13 | 0.0769231 |
| South | H5 | 1 | 13 | 0.0769231 |
| South | H7 | 1 | 13 | 0.0769231 |
| South | HV | 1 | 13 | 0.0769231 |
| South | J1 | 2 | 13 | 0.1538462 |
| South | K1 | 1 | 13 | 0.0769231 |
| South | R0 | 1 | 13 | 0.0769231 |
| South | T2 | 2 | 13 | 0.1538462 |
| South | U5 | 2 | 13 | 0.1538462 |
| South | U7 | 1 | 13 | 0.0769231 |
| South | X2 | 1 | 13 | 0.0769231 |
| Southeast | A+ | 1 | 14 | 0.0714286 |
| Southeast | A2 | 5 | 14 | 0.3571429 |
| Southeast | A8 | 1 | 14 | 0.0714286 |
| Southeast | B4 | 4 | 14 | 0.2857143 |
| Southeast | C1 | 3 | 14 | 0.2142857 |
| Southeast | G1 | 1 | 14 | 0.0714286 |
| Southeast | L0 | 2 | 14 | 0.1428571 |
| Southeast | L1 | 8 | 14 | 0.5714286 |
| Southeast | L2 | 4 | 14 | 0.2857143 |
| Southeast | L3 | 9 | 14 | 0.6428571 |
| Southeast | L4 | 2 | 14 | 0.1428571 |
| Southeast | M5 | 2 | 14 | 0.1428571 |
| Southeast | U6 | 2 | 14 | 0.1428571 |
| Southeast | X1 | 1 | 14 | 0.0714286 |
This other visualization table depicts the visualization of a SNP (single nucleotide polymorphism in the last column):
| SampleID | Genotype | Origin | Region | MT_16093_bp | Found_Polys |
|---|---|---|---|---|---|
| AF243627 | A2 | Amerindian/Asian | Northeast | NA | 152C! 16111T 16126C 16223T 16259T 16290T 16319A 16362C |
| AF243628 | G1 | Amerindian/Asian | Northeast | NA | 16223T 16325C 16362C |
| AF243629 | B4 | Amerindian/Asian | Northeast | NA | 16189C 16217C |
| AF243630 | B2 | Amerindian/Asian | Northeast | NA | 16189C 16217C 16249C 16312G 16344T |
| AF243631 | A+ | Amerindian/Asian | Northeast | NA | 16223T 16290T 16319A 16362C |
| AF243632 | C1 | Amerindian/Asian | Northeast | NA | 16223T 16298C 16325C 16327T 16362C |
| AF243633 | M7 | Amerindian/Asian | Northeast | NA | 16223T 16295T 16362C |
| AF243634 | L1 | African | Northeast | 0 | 1438G! 15301A! 16126C 16129A! 16187T 16189C 16223T 16264T 16270T 16278T 16293G 16311C |
| AF243635 | L3 | African | Northeast | 0 | 16176T 16223T 16327T |
| AF243636 | M4 | African | Northeast | 0 | 6131G! 16223T 16294T 16294T |
| AF243637 | L3 | African | Northeast | 0 | 16223T 16327T |
| AF243638 | L0 | African | Northeast | 0 | 73G! 146C! 182T! 195C! 263G! 15301A! 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243639 | L2 | African | Northeast | 0 | 150T! 182T! 16189C 16192T 16223T 16278T 16294T 16309G 16311C! |
| AF243640 | L3 | African | Northeast | 1 | 16124C 16223T |
| AF243641 | L3 | African | Northeast | 0 | 16185T 16223T 16327T |
| AF243642 | L2 | African | Northeast | 0 | 16223T 16264T 16278T 16311C! |
| AF243643 | L2 | African | Northeast | 0 | 150T! 182T! 16189C 16223T 16225T 16234T 16278T 16294T 16309G 16311C! |
| AF243644 | L1 | African | Northeast | 0 | 15301A! 16129A 16187T 16189C 16214T 16223T 16265C 16278T 16291T 16294T 16311C 16360T |
| AF243645 | L1 | African | Northeast | 1 | 15301A! 16129A 16187T 16189C 16223T 16265C 16278T 16286G 16294T 16311C 16360T |
| AF243646 | L2 | African | Northeast | 0 | 150T! 182T! 16223T 16278T 16294T 16309G 16311C! |
| AF243647 | L0 | African | Northeast | 0 | 73G! 146C! 182T! 195C! 263G! 15301A! 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243648 | L3 | African | Northeast | 0 | 16124C 16223T |
| AF243649 | L3 | African | Northeast | 0 | 16172C 16223T 16327T |
| AF243650 | L2 | African | Northeast | 0 | 150T! 182T! 16223T 16278T 16294T 16309G 16311C! |
| AF243651 | L3 | African | Northeast | NA | 16129A 16209C 16223T 16292T 16295T 16311C |
| AF243652 | H1 | European | Northeast | NA | 16309G |
| AF243653 | H1 | European | Northeast | NA | 16362C |
| AF243654 | J | European | Northeast | NA | 16069T 16126C |
| AF243655 | HV | European | Northeast | NA | 16234T 16311C 16362C |
| AF243656 | H1 | European | Northeast | NA | 16075C 16189C 16356C |
| AF243657 | K1 | European | Northeast | NA | 16093C 16224C 16311C 16319A |
| AF243658 | H7 | European | Northeast | NA | 16221T |
| AF243659 | K | European | Northeast | NA | 16224C 16311C |
| AF243660 | H2 | European | Northeast | NA | |
| AF243661 | H1 | European | Northeast | NA | 16189C 16356C |
| AF243662 | T2 | European | Northeast | NA | 16126C 16294T 16296T 16304C |
| AF243663 | H2 | European | Northeast | NA | 16189C |
| AF243664 | V7 | European | Northeast | NA | 16153A 16298C |
| AF243665 | H3 | European | Northeast | NA | 16293G |
| AF243666 | L3 | African | Southeast | NA | 750G! 16223T 16265T |
| AF243667 | L2 | African | Southeast | NA | 16111A 16145A 16184T 16223T 16239T 16278T 16292T 16311C 16355T |
| AF243668 | L3 | African | Southeast | NA | 750G! 16223T 16265T |
| AF243669 | L1 | African | Southeast | NA | 15301A! 16086C 16129A 16187T 16189C 16223T 16241G 16274A 16278T 16291T 16293G 16294T 16311C 16360T |
| AF243670 | L3 | African | Southeast | NA | 16185T 16209C 16223T 16327T |
| AF243671 | L2 | African | Southeast | NA | 150T! 195C! 16223T 16224C 16278T 16311C! |
| AF243672 | L2 | African | Southeast | NA | 16223T 16264T 16278T 16311C 16311C |
| AF243673 | L0 | African | Southeast | NA | 73G! 146C! 182T! 185A! 195C! 263G! 15301A! 16129A! 16148T 16172C 16187T 16188G 16189C 16223T 16230G 16278T! 16311C 16320T |
| AF243674 | M5 | African | Southeast | NA | 16223T 16278T 16294T |
| AF243675 | L1 | African | Southeast | NA | 15301A! 16071T 16129A 16145A 16187T 16189C 16213A 16223T 16234T 16265C 16278T 16286G 16294T 16311C 16360T |
| AF243676 | U6 | African | Southeast | NA | 16172C 16189C 16219G 16278T |
| AF243677 | U6 | African | Southeast | NA | 16172C 16189C 16219G 16278T 16362C |
| AF243678 | L4 | African | Southeast | NA | 5460A! 16223T 16293T 16311C 16355T 16362C |
| AF243679 | L2 | African | Southeast | NA | 16114A 16129A 16213A 16223T 16278T 16311C! |
| AF243680 | L1 | African | Southeast | NA | 1438G! 15301A! 16126C 16129A! 16187T 16189C 16223T 16264T 16270T 16278T 16311C |
| AF243681 | L4 | African | Southeast | NA | 5460A! 16223T 16293T 16311C 16355T 16362C |
| AF243682 | X1 | African | Southeast | NA | 16104T 16189C 16223T 16278T! |
| AF243683 | L1 | African | Southeast | NA | 195C! 2283T! 7055G! 15301A! 16104T 16129A! 16163G 16187T 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243684 | L3 | African | Southeast | NA | 10398G! 16185T 16223T 16327T! |
| AF243685 | L0 | African | Southeast | NA | 73G! 146C! 152C! 182T! 195C! 263G! 15301A! 16093C 16129A 16148T 16168T 16172C 16187T 16188G 16189C 16223T 16230G 16278T 16278T 16293G 16311C 16320T |
| AF243686 | L3 | African | Southeast | NA | 16185T 16223T 16327T |
| AF243687 | L1 | African | Southeast | NA | 15301A! 16129A 16187T 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243688 | L1 | African | Southeast | NA | 195C! 7055G! 15301A! 16129A 16163G 16187T 16189C 16209C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243689 | L1 | African | Southeast | NA | 15301A! 16086C! 16129A! 16189C 16223T 16278T 16293G 16294T 16311C 16360T |
| AF243690 | L3 | African | Southeast | NA | 16223T 16320T |
| AF243691 | L3 | African | Southeast | NA | 16172C 16189C 16223T 16320T |
| AF243692 | M5 | African | Southeast | NA | 16223T 16278T 16294T |
| AF243693 | L3 | African | Southeast | NA | 16172C 16189C 16223T 16311C 16320T |
| AF243694 | L1 | African | Southeast | NA | 198T! 10398G! 15301A! 16129A 16187T 16189C 16223T! 16278T 16293G 16294T 16311C 16360T |
| AF243695 | L3 | African | Southeast | NA | 16172C 16189C 16223T 16320T |
| AF243696 | A+ | Amerindian/Asian | Southeast | NA | 16189C 16223T 16290T 16319A 16362C |
| AF243697 | A2 | Amerindian/Asian | Southeast | NA | 152C! 16097C 16098G 16111T 16223T 16290T 16319A 16362C |
| AF243698 | C1 | Amerindian/Asian | Southeast | NA | 16223T 16325C 16327T |
| AF243699 | A2 | Amerindian/Asian | Southeast | NA | 152C! 16111T! 16126C 16223T 16278T 16290T 16319A 16362C |
| AF243700 | A2 | Amerindian/Asian | Southeast | NA | 152C! 16111T 16192T 16223T 16290T 16319A 16362C |
| AF243701 | A8 | Amerindian/Asian | Southeast | NA | 16223T 16242T 16290T 16319A |
| AF243702 | B4 | Amerindian/Asian | Southeast | NA | 16189C 16217C |
| AF243703 | B4 | Amerindian/Asian | Southeast | NA | 16189C 16217C |
| AF243704 | G1 | Amerindian/Asian | Southeast | NA | 16223T 16325C 16362C |
| AF243705 | C1 | Amerindian/Asian | Southeast | NA | 16223T 16298C 16325C 16327T |
| AF243706 | A2 | Amerindian/Asian | Southeast | NA | 152C! 16111T 16223T 16290T 16319A 16362C |
| AF243707 | B4 | Amerindian/Asian | Southeast | NA | 16189C 16217C |
| AF243708 | C1 | Amerindian/Asian | Southeast | NA | 16223T 16298C 16325C 16327T |
| AF243709 | B4 | Amerindian/Asian | Southeast | NA | 16189C 16217C |
| AF243710 | A2 | Amerindian/Asian | Southeast | NA | 152C! 16111T 16189C 16223T 16290T 16319A 16362C |
| AF243780 | H5 | European | South | 0 | 16304C |
| AF243781 | H1 | European | South | 0 | 16162G |
| AF243782 | U5 | European | South | 0 | 16144C 16189C 16192T! 16270T |
| AF243783 | T2 | European | South | 0 | 16126C 16153A 16294T 16296T |
| AF243784 | H2 | European | South | 0 | |
| AF243785 | J1 | European | South | 0 | 16069T 16126C 16261T |
| AF243786 | H2 | European | South | 0 | 16124C 16354T |
| AF243787 | H7 | European | South | 0 | 16213A |
| AF243788 | T2 | European | South | 0 | 16126C 16147T 16294T 16296T 16297C 16304C |
| AF243789 | H3 | European | South | 1 | 16093C |
| AF243790 | X2 | European | South | 0 | 16189C 16223T 16248T 16278T |
| AF243791 | HV | European | South | 0 | 16298C |
| AF243792 | U5 | European | South | 0 | 16189C 16192T! 16270T |
| AF243793 | K1 | European | South | 0 | 16224C 16311C 16319A |
| AF243794 | U7 | European | South | 0 | 16309G 16318T |
| AF243795 | J1 | European | South | 0 | 2706G! 16069T 16126C 16222T |
| AF243796 | R0 | European | South | 0 | 16126C 16362C |
These results show that the identity of self-declared white individuals in Brazil is not limited to individuals of European matrilineal ancestry, in possible agreement with the ideology of racial democracy proposed by Gilberto Freire. It also shows that individuals that self-identify as white, as well as public governmental policies should consider the genetic risk associated with African variants in individuals that self-identify as white in Brazil. At this point it is not possible to establish the cause-effect scenario but higher mortality among self-declared white individuals due to neuroblastoma is predominant in white individuals, either because self-declared white individuals most often use the public health system (an evidence of structural racism) or because the genetic risk variants are as well present in self-declared white individuals. We calculate an incidence of 30% for haplogroup L3 in the Northeast region (total 17 samples analyzed from Brazil) (Table above). For the 17 samples, the L3 haplogroup is not present among self-declared white individuals in the Southeastern and South of Brazil. The incidence of mitochondrial haplogroups of African ancestry is observed in self-declared white Brazilians (Figure 7), consistent with Fridman’s (2014) observation of 35% African matrilineal lineage in self-declared white individuals in Brazil.
On the other hand, if haplogroups L1 and L3 confer a greater risk to individuals identified as white for diseases of the nervous system, for neuroblastoma, haplogroup K, of European origin, presents a reduced risk of neuroblastoma according to Chang 2020, suggesting that Brazilians who self-declare as white and who have the European mitochondrial genotype are less susceptible to neuroblastoma than individuals of African matrilineal lineage (Figure 7).
To apply models such as the one described in Chaves et al., 2024 (in preparation) to genomic research in diseases of the nervous system in Brazilians, we propose a collaborative genomic research between UCSC and UChicago in the USA and CIDACS at Instituto Gonçalo Moniz in Brazil. To allow the collaboratoin, we accessed neuroblastoma mortality data from Brazilians, using the Microdatasus R package. We found a predominance of deaths of self-declared white individuals in the first decade of the 2000s (Figure 3).
The prevalence of self-declared white individuals in neuroblastoma mortality data in Brazil can be attributed to the ideology of “whitening” (Pena et al. 2011) in the Brazilian racial identity (Mitchell 2022) and to gender asymmetry in interracial relationships in Brazil (Pena 2007). After 2010, a decrease in the proportion of mortality of self-declared white individuals was observed (Figure 3).
In the code below we plot the mortality rate of neuroblastoma in Brazil using the dados_nb_appended_melted object that was saved somewhere else.
library(ggplot2)
nb_mortality_df <- readRDS("../../ReComBio Scientific/geraldo/data/dados_nb_appended_melted.rds")
p_nb <- nb_mortality_df %>%
ggplot(aes(x = year, y = value,
fill = raca_cor_factor)) +
geom_bar(position="fill", stat="identity")
ggplotly(p_nb)
Figure 3: Annual mortality due to neuroblastoma in a sample of the Brazilian population between 2000-2015, made available by the SUS Information Technology Department, DATASUS. Mortality is calculated using the R language through the Microdatasus library. Artwork by @allison_horst
Since the TET1/5-hmC hypomethylation genomic model can only be applied to Brazilians after sequencing 5-hmC from cfDNA of Brazilian patients, the current MVP is a proof of concept and analyzes mitochondrial DNA of Brazilians published by Alves-Silva et al., 2000.
The prevalence of self-declared white individuals in the neuroblastoma mortality data (Figure 3) is in line with the African ancestry (indicated in purple) of haplogroups L1 and L3 in self declared white individuals in Brazil and a higher risk associated of the incidence of nervous system diseases in these haplogroups according to Tranah (Tranah et al. 2012; Tranah et al. 2014) (Figure 2). Because we identified genetic variants associated with higher risk of diseases in the nervous system in the Brazilian mitochondrial sequences, we then decided to look at the incidence of these mitochondrial haplogroups per large geographic region of Brazil, using the geographic information present in the Alves-Silva et al. (2000) study. To do that we accessed geographic information about the large regions and states of the Brazilian territory using the R package geobr and the read_state function as follows:
data_estados_brancos_perct <- readRDS("../../ReComBio Scientific/geraldo/data/data_estados_brancos_perct.rds")
# read all states
states <- read_state(
year = 2019,
showProgress = FALSE
)
# Join the states and data_estados_brancos_perct databases
states_perct_brancos <- dplyr::left_join(states, data_estados_brancos_perct, by = c("abbrev_state" = "UF"))
To begin looking into the spatial and geographic information of the demographic data stored by DATASUS, we visualized the mortality rate of self-declared white individuals in the states of Brazil as follows. We obtained the number of self-declared white individuals that have dementia risk estimated in each State. With this number, we can estimate the number of self-declared white individuals reported as passing away by the Unfied Health System (SUS) in 2014. We can observe theat proportionally, more self-reported white individuals passed away in Southern Brazil than any other large region:
# states_perct_brancos <- readRDS("./data/states_perct_brancos.rds")
ggplot() +
geom_sf(data=states_perct_brancos, aes(fill=brancos),
color= "black", size=.15) + ## Color here is the line of the border
labs(subtitle="", size=8) +
scale_fill_distiller(palette = "Blues", name="Mortality\nRate", direction=+1,
limits = c(0,1)) +
theme_minimal() #+no_axis
Figure 4: Frequency of mortality of self-declared white individuals in Brazil
Having established where the highest incidence of self-reported white individuals mortality rate is, we begin accessing the incidence of the mitochondrial haplogroups that confer protection or risk of diseases in the nervous system. In Table 3 we observe the incidence of each haplogroup identified in the mitochondrial sequences isolated from self-declared white Brazilians (Table 3). Column Region in that table, can be used to merge the genotype information to the demographic information contained in this other table extracted using the Microdatasus R package.
dados_estados_nb_2013 <- readRDS("../../R/R journal/data/dados_estados_nb_2013.rds")
dim(dados_estados_nb_2013)
[1] 302 8
# kable(dados_estados_nb_2013, caption="dados_estados_nb_2013") %>%
# kable_styling("striped", full_width = F, font_size = 12) %>%
# scroll_box(width = "100%", height = "600px")
head(dados_estados_nb_2013)
CONTADOR RACACOR DTOBITO CAUSABAS DTNASC IDADE UF Region
6222 6222 1 23022013 C749 30122003 409 TO North
31026 24155 1 16022013 C749 09022012 401 SP Southeast
44075 37204 <NA> 09042013 C749 25102009 403 SP Southeast
44097 37226 2 08082013 C749 03042012 401 SP Southeast
46745 39874 1 10012013 C749 22031953 459 SP Southeast
48051 41180 1 19012013 C749 02032011 401 SP Southeast
Note that both Table 3 and dados_estados_nb_2013 have a column named Regions, depicting the large Regions of Brazil. A column similar to this column can be used to store information about State, Municipality, City and local information about the health unit that is serving the patient in the public health system.
After estimating mortality by race and region in Brazil (Figure 4), we begin estimating the incidence of the mitochondrial haplogroups by geographic regions. These haplogroups were reported to associate of genetic risk for diseases of the nervous system. We can now estimate which mitochondrial lineages are exposed to the highest risks along the territory of Brazil.
This haplogroup is predominant in Southern Brazil, with significant presence in the Northeast. Haplogroup J was reported by Tranah et al. (2012) to be associated with cognitive impairment. Feder et al. (2008) identified that this haplogroup also associates with type 2 diabetes in Ashkenazi Jews, a disease known to be more frequent in people that have neurodegenerative diseases (Chaves et al., 2019).
Figure 5: Estimation of incidence of mitochondrial haplogroup J in populations of the large regions Brazil.
This haplogroup was found predominant in Southern and Northeasthern Brazil in this study. Chang et al. (2020) reported that haplogroup K protects against neuroblastoma and is associated with protection against the righ risk neuroblastoma disease. It is possible that this haplogroup is associated with increased inflammatory response and T-cell infiltration in hot neuroblastoma tumors via mitochondrial reprogramming of metabolism in cancer and the immunological cells.
Figure 6: Estimation of incidence of mitochondrial haplogroup K in populations of the large regions Brazil.
This haplogroup is predominant in Southeastern and Northeastern Brazil. Haplogroup L3 was reported by Tranah et al. (2014) to be associated with cognitive impairment in African Americans. It is possible that this haplogroup is associated with increased mortality by neuroblastoma in self-declared white Brazilians of mitochondrial African ancestry.
Figure 7: Estimation of incidence of mitochondrial haplogroup L3 in populations of the large regions Brazil.
Haplogroup T was detected in samples from Northeastern and Southern Brazil. According to a study by Kofler et al. (2009), mitochondrial DNA haplogroup T is associated with coronary artery disease and diabetic retinopathy. Chang et al. (2020) also reported association between haplogroup T and neuroblastoma.
Figure 8: Estimation of incidence of mitochondrial haplogroup T in populations of the large regions Brazil.
Construction of genetic risk databases can help decision-makers identify regions and individuals with higher genetic risk earlier, contributing to informed use of financial resources in the public health system.
Machine learning algorithms and artificial intelligence can contribute to improving public health policies by informing population groups that are at greater diseases risk using large scale datasets.
Mitochondrial genetic variants inform the risk of nervous system diseases such as neuroblastoma and neurodegenerative diseases.
We show that Brazilians of African mitochondrial matrilinear ancestry have variants for increased risk of neurodegenerative diseases and do not carry protective neuroblastoma variants.
Mitochondrial sequences are simple enough to be integrated with public databases such as the database of the SUS IT department (DATASUS) and allow population stratification.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Chaves & Ramos, "GERALDA: A framework for integration of genomic information into the database of the Brazilian Health System", The R Journal, 2024
BibTeX citation
@article{quokka-bilby,
author = {Chaves, Gepoliano and Ramos, Pablo Ivan},
title = {GERALDA: A framework for integration of genomic information into the database of the Brazilian Health System},
journal = {The R Journal},
year = {2024},
issn = {2073-4859},
pages = {1}
}